From source code identifiers to natural language terms
نویسندگان
چکیده
Program comprehension techniques often explore program identifiers, to infer knowledge about programs. The relevance of source code identifiers as one relevant source of information about programs is already established in the literature, as well as their direct impact on future comprehension tasks. Most programming languages enforce some constrains on identifiers strings (e.g., white spaces or commas are not allowed). Also, programmers often use word combinations and abbreviations, to devise strings that represent single, or multiple, domain concepts in order to increase programming linguistic efficiency (convey more semantics writing less). These strings do not always use explicit marks to distinguish the terms used (e.g., CamelCase or underscores), so techniques often referred as hard splitting are not enough. This paper introduces Lingua::IdSplitter a dictionary based algorithm for splitting and expanding strings that compose multi-term identifiers. It explores the use of general programming and abbreviations dictionaries, but also a custom dictionary automatically generated from software natural language content, prone to include application domain terms and specific abbreviations. This approach was applied to two software packages, written in C, achieving a f-measure of around 90% for correctly splitting and expanding identifiers. A comparison with current state-of-the-art approaches is also presented. © 2014 Elsevier Inc. All rights reserved.
منابع مشابه
Probabilistic SynSet Based Concept Location
Concept location is a common task in program comprehension techniques, essential in many approaches used for software care and software evolution. An important goal of this process is to discover a mapping between source code and human oriented concepts. Although programs are written in a strict and formal language, natural language terms and sentences like identifiers (variables or functions n...
متن کاملIdentifying Idioms of Source Code Identifier in Java Context
This paper presents an approach to identifying a domain word POS(Part of Speech) and idiom code identifiers written in Java programming language. To detect them, we extracted common identifiers from 14 Java API documents, and applied diverse filters. In addition, NLP (Natural Language Parser) has been used to detect common mistakes in the Java API documents. As a result, this paper identified 8...
متن کاملSupporting Concept Extraction and Identifier Quality Improvement through Programmers' Lexicon Analysis
Identifiers play an important role in communicating the intentions associated with the program entities they represent. The information captured in identifiers support programmers to (re-)build the “mental model” of the software and facilitates understanding. (Re-)building the “mental model” and understanding large software, however, is difficult and expensive. Besides, the effort involved in t...
متن کاملThe impact of vocabulary normalization
Software development, evolution, and maintenance depend on ever increasing tool support. Recent tools have incorporated increasing analysis of the natural language found in source code, predominately in the identifiers and comments. However, when coders combine abbreviations and acronyms to form multiword identifiers, they, in essence, invent new vocabulary making the source code’s vocabulary d...
متن کاملContext Awareness for Effective Software Structure Quality
This paper presents an approach that helps developers to maintain source code identifiers and comments dependable with high-level artifact. This approach calculates and shows the textual similarity source code and related artifacts. The assumption is developers are induced to improve the source code lexicon (terms) used in identifiers or comments. The software development environment provides i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Systems and Software
دوره 100 شماره
صفحات -
تاریخ انتشار 2015